There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction
نویسندگان
چکیده
Current methods for automatically evaluating grammatical error correction (GEC) systems rely on gold-standard references. However, these methods suffer from penalizing grammatical edits that are correct but not in the gold standard. We show that reference-less grammaticality metrics correlate very strongly with human judgments and are competitive with the leading reference-based evaluation metrics. By interpolating both methods, we achieve state-of-the-art correlation with human judgments. Finally, we show that GEC metrics are much more reliable when they are calculated at the sentence level instead of the corpus level. We have set up a CodaLab site for benchmarking GEC output using a common dataset and different evaluation metrics.
منابع مشابه
Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems
In grammatical error correction (GEC), automatically evaluating system outputs requires gold-standard references, which must be created manually and thus tend to be both expensive and limited in coverage. To address this problem, a referenceless approach has recently emerged; however, previous reference-less metrics that only consider the criterion of grammaticality, have not worked as well as ...
متن کاملHuman Evaluation of Grammatical Error Correction Systems
The paper presents the results of the first large-scale human evaluation of automatic grammatical error correction (GEC) systems. Twelve participating systems and the unchanged input of the CoNLL-2014 shared task have been reassessed in a WMT-inspired human evaluation procedure. Methods introduced for the Workshop of Machine Translation evaluation campaigns have been adapted to GEC and extended...
متن کاملGround Truth for Grammaticality Correction Metrics
How do we know which grammatical error correction (GEC) system is best? A number of metrics have been proposed over the years, each motivated by weaknesses of previous metrics; however, the metrics themselves have not been compared to an empirical gold standard grounded in human judgments. We conducted the first human evaluation of GEC system outputs, and show that the rankings produced by metr...
متن کاملGrammatical Error Correction of English as Foreign Language Learners
This study aimed to discover the insight of error correction by implementing two correction systems on three Iranian university students. The three students were invited to write four in-class essays throughout the semester, in which their verb errors and individual-selected errors were corrected using the Code Correction System and the Individual Correction System. At the end of the study, the...
متن کاملThe Impact of Immediate Grammatical Error Correction on Senior English Majors’ Accuracy at Hebron University
This study aimed at investigating the effects of grammatical error correction on EFL learners’ accuracy. Twenty-two male and female senior students were chosen randomly to respond to a questionnaire investigating their beliefs about immediate grammatical error correction. Actually, the study was conducted in order to answer this question: what is the effect of grammatical error feedback on stu...
متن کامل